227 research outputs found

    Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge

    Get PDF
    The Interspeech 2016 Native Language recognition challenge was to identify the first language of 867 speakers from their spoken English. Effectively this was an L2 accent recognition task where the L1 was one of eleven languages. The lack of transcripts of the spontaneous speech recordings meant that the currently best performing accent recognition approach (ACCDIST) developed by the author could not be applied. Instead, the objectives of this study were to explore whether within-speaker features found to be effective in ACCDIST would also have value within a contemporary GMM-based accent recognition approach. We show that while Gaussian mean supervectors provide the best performance on this task, small gains may be had by fusing the mean supervector system with a system based on within-speaker Gaussian mixture distances

    Using Web Audio To Deliver Interactive Speech Tools In The Browser

    Get PDF
    In 2014, the number of web pages delivered to tablets and smartphones overtook the number delivered to laptop and desktop computers, with a majority of users saying they prefer these new portable platforms over conventional computers for many tasks. This shift in device use provides both opportunities and challenges for providers of speech analysis tools, phonetic demonstrations and language teaching aids. It is an opportunity because web standards mean we can make our applications available to a wide audience through a single consistent programming architecture rather than writing for one particular computing platform. It is a challenge because tablets and smartphones are less powerful, require different programming skills and have different limitations in terms of user interface. In this article, I will show how interactive applications in Phonetics and Speech Science can be written to run in web browsers on any computing platform. These are native web applications, written in HTML, CSS and JavaScript that can capture, replay, display, process, and analyze audio using the Web Audio API without needing any plugins. I will describe - and give the URLs of - some demonstration applications. I will discuss some future opportunities in the area of collaborative research and some remaining challenges that arise from incompatibilities across browsers. My audience is teachers and students with intermediate web programming skills wanting to build custom speech displays, perform custom speech analysis or run speech audio experiments over the web

    A Comparison of Human and Machine Estimation of Speaker Age

    Get PDF
    The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers

    Two-level recognition of isolated word using neural nets

    Get PDF
    Describes a neural-net based isolated word recogniser that has a better performance on a standard multi-speaker database than the reference hidden Markov model recogniser. The complete neural net recogniser is formed from two parts: a front-end which transforms the complex acoustic specification of the speech into a simplified phonetic feature specification, and a whole-word discriminator net. Each level was trained separately, thus considerably reducing the time necessary to train the overall system

    Two-level recognition of isolated word using neural nets

    Get PDF
    This paper describes a neural-net based isolated word recogniser that has a better performance on a standard multi-speaker database than our reference Hidden Markov Model recogniser. The complete neural net recogniser is formed from two parts: a front-end which transforms the complex acoustic specification of the speech into a simplified phonetic feature specification, and a whole-word discriminator net. Each level was trained separately, thus considerably reducing the time necessary to train the overall system

    Predicting fatigue and psychophysiological test performance from speech for safety-critical environments

    Get PDF
    Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject’s circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5–12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications

    It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge

    Get PDF
    This paper describes an evaluation of four different voice feature sets for detecting symptoms of the common cold in speech as part of the Interspeech 2017 Computational Paralinguistics Challenge. The challenge corpus consists of 630 speakers in three partitions, of which approximately one third had a “severe” cold at the time of recording. Success on the task is measured in terms of unweighted average recall of cold/not-cold classification from short extracts of the recordings. In this paper we review previous voice features used for studying changes in health and devise four basic types of features for evaluation: voice quality features, vowel spectra features, modulation spectra features, and spectral distribution features. The evaluation shows that each feature set provides some useful information to the task, with features from the modulation spectrogram being most effective. Feature-level fusion of the feature sets shows small performance improvements on the development test set. We discuss the results in terms of the most suitable features for detecting symptoms of cold and address issues arising from the design of the challenge

    X-ray emission from the Sombrero galaxy: discrete sources

    Get PDF
    We present a study of discrete X-ray sources in and around the bulge-dominated, massive Sa galaxy, Sombrero (M104), based on new and archival Chandra observations with a total exposure of ~200 ks. With a detection limit of L_X = 1E37 erg/s and a field of view covering a galactocentric radius of ~30 kpc (11.5 arcminute), 383 sources are detected. Cross-correlation with Spitler et al.'s catalogue of Sombrero globular clusters (GCs) identified from HST/ACS observations reveals 41 X-rays sources in GCs, presumably low-mass X-ray binaries (LMXBs). We quantify the differential luminosity functions (LFs) for both the detected GC and field LMXBs, whose power-low indices (~1.1 for the GC-LF and ~1.6 for field-LF) are consistent with previous studies for elliptical galaxies. With precise sky positions of the GCs without a detected X-ray source, we further quantify, through a fluctuation analysis, the GC LF at fainter luminosities down to 1E35 erg/s. The derived index rules out a faint-end slope flatter than 1.1 at a 2 sigma significance, contrary to recent findings in several elliptical galaxies and the bulge of M31. On the other hand, the 2-6 keV unresolved emission places a tight constraint on the field LF, implying a flattened index of ~1.0 below 1E37 erg/s. We also detect 101 sources in the halo of Sombrero. The presence of these sources cannot be interpreted as galactic LMXBs whose spatial distribution empirically follows the starlight. Their number is also higher than the expected number of cosmic AGNs (52+/-11 [1 sigma]) whose surface density is constrained by deep X-ray surveys. We suggest that either the cosmic X-ray background is unusually high in the direction of Sombrero, or a distinct population of X-ray sources is present in the halo of Sombrero.Comment: 11 figures, 5 tables, ApJ in pres

    Performance of the CMS Cathode Strip Chambers with Cosmic Rays

    Get PDF
    The Cathode Strip Chambers (CSCs) constitute the primary muon tracking device in the CMS endcaps. Their performance has been evaluated using data taken during a cosmic ray run in fall 2008. Measured noise levels are low, with the number of noisy channels well below 1%. Coordinate resolution was measured for all types of chambers, and fall in the range 47 microns to 243 microns. The efficiencies for local charged track triggers, for hit and for segments reconstruction were measured, and are above 99%. The timing resolution per layer is approximately 5 ns

    Performance and Operation of the CMS Electromagnetic Calorimeter

    Get PDF
    The operation and general performance of the CMS electromagnetic calorimeter using cosmic-ray muons are described. These muons were recorded after the closure of the CMS detector in late 2008. The calorimeter is made of lead tungstate crystals and the overall status of the 75848 channels corresponding to the barrel and endcap detectors is reported. The stability of crucial operational parameters, such as high voltage, temperature and electronic noise, is summarised and the performance of the light monitoring system is presented
    corecore